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Controlled Expression of Heterologous Proteins in the Mammary 
Gland of a Transgenic Animal 

TECHNICAL FIELD 

This invention relates to expression of gene expression in mammary gland tissue, 

BACKGROUND 

This application claims priority to U.S. Provisional Patent Application No. 60/1 17,690. 

5 The field of transgenics has grown rapidly since the initial experiments describing the 

introduction of foreign DNA into the developing zygote or embryo (Brinster, R.L, et al., Proc. 
Natl. Acad. Sci. USA 82:4438-4442 (1985); Wagner et al., U.S. 4,873,191 (1989)). 
Transgenic technology has been applied to both laboratory and domestic species for the study 
of human diseases (Synder, B,W., et al., Mol. Reprod. and Develop. 40:419-428 (1995)), 

10 production of pharmaceuticals in milk (Ebert, K.M. and J.P. Selgrath, "Changes in Domestic 
Livestock through Genetic Engineering" in Applications in Mammalian Development, Cold 
Spring Harbor Laboratory Press, 1991), to develop improved agricultural stock (see, for 
example, Ebert, K.M. et al., Animal Biotechnology 1:145-159 (1990)) and xenotransplantation 
(Osman, N., et al, Proc. Natl. Acad. Sci USA 94:14677-14682 (1997)). A crucial step in the 

15 development of transgenic animals is the construction of the vector or cassette to be 

microinjected. The ultimate utility or value of the transgenic animal is dependent on the 
specificity and strength of the promoter being used to express the gene of interest. This fact is 
particularly evident in utilizing the mammary gland of transgenic animals for the production 
of pharmaceuticals. 

20 Researchers aiming to produce pharmaceuticals in the milk of lactating 

transgenic animals focused on the cloning and characterization of the genes associated with 
the major milk proteins from the domestic species and common laboratory animals. For 
example, the genes for goat beta casein (Roberts, B. et al., Gene 121:255-262 (1992)) and 
sheep beta lactoglobulin (Simons, J.P. et al.. Nature 328:530-532 (1987)) were isolated and 

25 used to produce transgenic mice to demonstrate the ability to direct expression to the 

mammary gland. In both cases, the protein product was detected in the milk, however, the 
expression was highly variable and not completely limited to the mammary gland. These 
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experiments clearly demonstrated that crucial control elements were not present in the vectors 
to correctly direct expression of the gene. This was further illustrated when a heterologous 
protein coding sequence was attached to a milk specific promoter (Wright, G., et al. 
Biotechnology 9:830-834 (1991); Ebert K.M. et al., Biotechnology 9:835-838 (1991)). In 

5 addition to the problem of inconsistent or non-tissue specific expression, researchers found 
that some transgenic animals over-expressed the target protein which caused problems with 
milk production (Shamay, A., et al., Transgenic Research 1:124-132 (1992); Ebert, K.M., et 
al.. Biotechnology 12:699-702 (1994)). This limits the commercial utility of the transgenic 
production system because many commercially valuable proteins are enzymes, growth factors, 

10 or even toxins. 



The invention provides a solution to the longstanding problem of inefficient or variable 
tissue-specific expression of heterologous genes in mammary gland tissue. Accordingly, the 
invention features transcription regulatory elements derived 6rom a milk specific promoter, 
15 e.g., a mammalian lactoferrin gene promoter. An isolated nucleic acid within the invention 
containins a promoter region derived from the human lactoferrin gene operably linked to a 
heterologous sequence, A heterologous sequence is one that does not encode a naturally 
occurring lactoferrin polypeptide. The promoter region includes at least 20 nucleotides of the 
nucleotide sequence of SEQ ID NO: 1 . For example, the promoter region contains nucleotides 



SUMMARY OF THE INVENTION 



20 



1-154 of SEQ ID NO: lor 2. 

Table 1 : Human Lactoferrin promoter region 



1 



CTCrG4rCCrCAAGGAACAAGTAGACCTGGCCGCGGGGAGT 



41 



GGGGAGGGAAGGGGTGTCTATTGGGCAACAGGGCGGCAAA 



25 



81 



GCCCTGAATAAAGGGGCGCAGGGCAGGCGCAAGTGCAGAG 



121 



CCTTCGTTTGCCAAGTCGCCrCG^GACCGCAGACATGAAA 



GCATGTCTCCGCGGAAAA (SEQ ID NO: 1) 



-2- 
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BamHl restriction site GGATCC (nucleotides 5-8) and Xhol site (nucleotides 140- 
145) are italicized. These restriction sites may be altered, e.g., replaced with other restriction 
sites or with nucleotides that do not represent restriction enzyme recognition sites. 

Table 2: Human Lactoferrin promoter region 

1 CTXXXXXXTCAAGGAACAAGTAGACCTGGCCGCGGGGAGT 
41 GGGGAGGGAAGGGGTGTCTATTGGGCAACAGGGCGGCAAA 
8 1 GCCCTGAATAAAGGGGCGC AGGGCAGGCGC AAGTGC AGAG 
121 CCTTCGTTTGCCAAGTCGCXXXXXXACCGCAGACATGAAA 

GCATGTCTCCGCGGAAAA (SEQIDN0:2) 
Optionally, the lactoferrin-derived promoter regions described above are linked to 
nucleotides 1-1 176 of nucleotide sequence of SEQ ID N0:16 (GENBANK™ accession 
no. S52659). 
Table 6 

1 cgaggatcat ggctcactgc caccttcatc tcccaggctc aaatggtcct cccactttag 

61 cctcccaagt agctgggacc ataggcatac accaccatgc tgggctaatt tttgtatttt 

121 ttgtagagat gggggtttcc ctatgaagcc caggctagtc ttgaactcct gggctcaagc 

181 gatcctccca tcttggcctc ccaaagtgct gggattacag gcatgagcca ctgtgccctg 

241 cctagttact cttgggctaa gttcacatcc atacacacag gatattcttt ctgaggcccc 

301 caatgtgtcc cacaggcacc atgctgtatg tgacactccc ctagagatgg atgtttagtt 

361 tgcttccaac tgattaatgg catgcagtgg tgcctggaaa catttgtacc tggggtgctg 

421 tgtgtcatgg gaatgtattt acgagatgta ttcttagaag cagtattcta gcttttgaat 

481 tttaaaatct gacatttatg gcgattgtta aaatgaggtt accatttcct attgaatact 

541 atcaacacca aaaaagaaga aggaggagat ggagaaaaaa aagacaaaaa aaaaaaaagt 

601 ggtagggcat cttagccata gggcatcttt ctcattggca aataagaaca tggaaccagc 

661 cttgggtggt ggccattccc ctctgaggtc cctgtctgtt ttctgggagc tgtattgtgg 

721 gtctcagcag ggcagggaga taccccatgg gcagcttgcc tgagactctg ggcagcctct 

781 cttttctctg tcagctgtcc ctaggctgct gctgggggtg gtcgggtcat cttttcaact 

841 ctcagctcac tgctgagcca aggtgaaagc aaacccacct gccctaactg gctcctaggc 

901 accttcaagg tcatctgctg aagaagatag cagtctcaca ggtcaaggcg atcttcaagt 

961 aaagaccctc tgctctgtgt cctgccctct agaaggcact gagaccagag ctgggacagg 

1021 gctcaggggg ctgcgactcc taggggcttg cagacctagt gggagagaaa gaacatcgca 

1081 gcagccaggc agaaccagga caggtgaggt gcaggctggc tttcctctcg cagcgcggtg 
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1 141 tggagtcx;tg tcctgcctca gggcttttcg gagcctggat cctcaaggaa caagtagacc 
1201 tggccgcggg gagtggggag ggaaggggtg tctattgggc aacagggcgg ggcaaagccc 
1261 tgaataaagg ggcgcagggc aggcgcaagt ggcagagcct tcgtttgcca agtcgcctcc 
1321 agaccgcaga catgaaactt gtcttcctcg tcctgctgtt cctcggggcc ctcggtgagt 

5 1381 gcaggtgcct gggggcgcga gccgcctgat gggcgtctcc tgcgccctgt ctgctaggcg 

1441 ctttggtccc tgtgtccggt tggctgggcg cggggtctct gcgccccgcg gtcccagcgc 
1501 ctacagccgg gaggcggccc ggacgcgggg ccagtctctt tcccacatgg ggaggaacag 
1 561 gagctgggct cctcaagccg gatcggggca cgcctagctc tgctcagagc ttctcaaaag 
1621 gcctcccagg cccctgtccc ttlgtgtccx gcctaaggat ttggtcccca ttgtattgtg 

10 1 681 acatgcgttt tacctgggag gaaagtgagg ctcagagagg gtgagcgact agctcaagga 

1741 ccctagtcca gatcctagct cctgcgagga ctgtgagacc ccagcaagac cgagccttta 
1801 tgagacttag tttcttcact taaagaaacg gcctaaccat gggtccacag ggttgtgagg 
1861 aggagatggg gcattcgcac accttccgtg gcagagggtt gtggaggggt gcggtgctcc 
1921 tgatggaacc ctgtgtcaga gggtttgaga gggaaatgtc agccaaacag aaggaaggag 

15 1 981 cagaaggaag gaaacaattg tcagttccat aaccaaagta atttctcggg tgctcagagg 

2041 gcactcccca gcgctgcaca ttagtgacct aaatgcgtga gtgcgg (SEQ ID NO: 16) 

By "isolated" is meant a nucleic acid molecule that is free of the genes which, in the 
naturally-occurring genome of the organism, flank the sequence of interest. The term 

20 therefore includes, for example, a recombinant DNA which is incorporated into a vector; into 
an autonomously replicating plasmid or virus; or into the genomic DNA of a procaryote or 
eucaryote; or which exists as a separate molecule (e.g., a cDNA or a genomic or cDNA 
fragment produced by PCR or restriction endonuclease digestion) independent of other 
sequences. It also includes a recombinant DNA which is part of a hybrid gene encoding 

25 additional polypeptide sequence. The term excludes large segments of genomic DNA, e.g., 
such as those present in cosmid clones, which contain a given DNA sequence flanked by one 
or more other genes which naturally flank it in a naturally-occurring genome. 

The lactoferrin-derived transcription regulatory sequences, are attached to a nominal 
promoter (e.g., the nominal lactoferrrin promoter or a heterologous promoter) which in tum 

30 is operably linked to a sequence to be transcribed. The heterologous sequence to be 

transcribed is a polypeptide-encoding sequence or antisense sequence. When incorporated 
into a transgenic mammal such as a cow, the regulatory sequences of the invention operably 
linked to a polypeptide-encoding sequence direct expression of a polypeptide at a level of at 
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least 0.1 mg/ml in milk. Preferably, the sequences direct production of the trangene product at 
a level of 1-S mg/ml in milk. 

The regulatory sequences described herein may be used as a bi-directional promoter 
capable of exerting its fimction independently of its orientation in relation to the nucleic acid 
to be transcribed. A nucleic acid according to the invention is obtained by any technique in 
use in the art, for example by cloning, hybridization with the aid of an appropriate probe, by 
Polymerase Chain Reaction (PCR), or by chemical synthesis. 

The nucleic acid of the invention includes an RNA stabilization sequence and/or a 
polyadenylation (poly A) sequence. Such stabilization or poly A sequences are preferably 
operatively linked to the heterologous nucleic acid sequence at the 3' end of the sequence to 
be transcribed. The heterologous nucleic acid to be transcribed is preferably insulin, 
calcitonin, serum albumin, a tetrameric antibody, an FAb fragment, a single chain antibody, a 
plasma protein, an industrial enzyme, silk, or a membrane receptor. The RNA stabilization 
sequence includes nucleotides 424-10S8 of SEQ ID N0:3 or 4. 

Table 3 : 3 ' Region of Human Lactoferrin Gene 

1 CAGGNTGGCCCAGTAAGGATTCCTGNGAATGAATTGAGTG 

41 AATCTGCCAGGTGAACATGGATTGCAAACCGGGTTCACAT 

81 TCCCCGGNAGAAGCTAGAGGNCCCACCCAATTTCTTGTGA 

121 ACTTGAGAATGTGACAGTCGATTCAATCAGAGACAAGTGC 

161 AGGGTGGTTGTGTCTCTCAGGCCAGAGCAGGGAAACACCC 

201 TGGCTGGTGAGGGCTAGACTCTGGCTCCCTTGAACACCGT 

241 AGTCGCTAGGAGTAGGGGAGTGGGAATATGAGTGTGGCAA 

28 1 GCACTGACTC AGTGATGGGAGAAGGGCAGAGAAAACTCTT 

321 AGTATTCTCTTTGATTTATTGGATTAAATAACTGGTTTAA 

361 TGGAAGAAATCAGTTTCTGAATCTCTTGCTCTGTTGTGTC 

401 CCACAGCCCTCCTGGAAGCCTGTGAATTCCTCAGGAAGTA 

441 AAACCGAAGAAGATGGCCCAGCTCCCCAAGAAAGCCTCAG 

48 1 CCATTCACTGCCCCCAGCTCTTCTCCCCAGGTGTGTTGGG 

521 GCCTTGGCCTCCCCTGCTGAAGGTGGGGATTGCCCAT 
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561 CCATCTGCTTACAATTCCCTGCTGTCGTCTTAGCAAGAAG 
601 TAAAATGAGAAATTTTGTTGATATTCTCTCCTTATAAAGT 
641 GTCACTCATCTTTTCTAGAATTTTATACTGAAATCACATG 
681 CCTGACAAAATACCTGTACAGTTGGACCTTCCCTTCCAAG 
721 TTTTCAGGTCCAGCCCCTCCTCTTTCTTGCAGTCTTGGGT 
761 ATGATGCCCAAGGGTCTGGAATTTAAGGCCAGGCCAAGCA 
801 CCGGTnTCCTAAGGGGATCTTGGTGGGTTATTCACATAG 
841 CTGGCTCANTGCACGTGCATGTATGTGCCTGGGAATGTNT 
881 GCCNTGTCCCCAAGGCAGGGCAGGGAAAGACCAAGGCCTT 
921 GGGAAATTATTAACNGGAAANNTANGGGTTCCAANTNGCC 
961 NCAATCNCHTTGCNNAAGTCCTAAATTTAACCAAGANCCT 
1001 NGGGTTGGGGTTTAAAAAGGGGGACCTTTTAATTCCCNAA 

1041 AGNTTCCCCTTAGGGGGG TGCGACAAGCCGC 

CGAAAGTTCCTCGAAGCTAGCTTCAGACGTGTCTAGA 
(SEQ ID NO: 3); bold type indicates nucleotides in exon 17 of human lactofeirin gene; 
. indicates a gap. 

Table 4: Lactoferrin-derived RNA stabilization sequence 

XXXXXXXAATTCCTCAGGAAGTA 

AAACCGAAGAAGATGGCCCAGCTCCCCAAGAAAGCCTCAG 

CCATTCACTGCCCCCAGCTCTTCTCCCCAGGTGTGTTGGG 

GCCTTGGCCTCCCCTGCTGAAGGTGGGGATTGCCCAT 

CCATCTGCTTACAATTCCCTGCTGTCGTCTTAGCAAGAAG 

TAAAATGAGAAATTTTGTTGATATTCTCTCCTTATAAAGT 

GTCACTCATCTTTTCTAGAATTTTATACTGAAATCACATG 

CCTGACAAAATACCTGTACAGTTGGACCTTCCCTTCCAAG 

TTTTCAGGTCCAGCCCCTCCTCTTTCTTGCAGTCTTGGGT 

ATGATGCCCAAGGGTCTGGAATTTAAGGCCAGGCCAAGCA 

CCGGTTTTCCTAAGGGGATCTTGGTGGGTTATTCACATAG 

CTGGCTCANTGCACGTGCATGTATGTGCCTGGGAATGTNT 

GCCNTGTCCCCAAGGCAGGGCAGGGAAAGACCAAGGCCTT 
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GGGAAATTATTAACNGGAAANNTANGGGTTCCAANTNGCC 
NCAATCNChTITGCNNAAGTCCTAAATTTAACCAAGA^^ 
NGGGTTGGGGTTTAAAAAGGGGGACCTTTTAATTCCCNAA 
AGNTTCCCCTTAGGGGGG (SEQ ID N0:4) 
5 The stabilization sequence may optionally include the nucleotide sequence 

TGCGACAAGCCGCCGAAAGTTCCTCGAAGCTAGCTTCAGACGTGTCTAGA (SEQ ID 

N0:5). 

Also within the invention is an isolated nucleic acid containing a lactoferrin-derived 
dominant control region (DCR) in the presence or absence of a lactoferrin-derived promoter 

10 sequence. A DCR is a nucleic acid sequence which directs consistent level, site of integration- 
independent, copy number-dependent expression of a nucleic acid operably linked thereto. 
For example, a DCR derived from genomic DNA located 5' or 3' to the transcription start site 
of lactoferrin directs transcription of a transgene product in mammary gland tissue of a 
transgenic mammal. Altematively, the DCR confers inducibility of polypeptide-encoding 

15 sequence to which it is linked. Preferably, the DCR regulates tissue-specific transcription of a 
heterologous nucleic acid sequence; the regulation of transcription by is position independent 
relative to the location of the heterologous nucleic acid sequence. For example, the DCR is 
located 5' or 3* to the sequence to be transcribed. An increase in the level of transcription of a 
heterologous nucleic acid sequence under the control of a DCR is directly proportionate to the 

20 number of copies of the DCR. 

A nucleic acid is a nucleotide polymer, e.g, a DNA or RNA. Preferably, the nucleic 
acid is a double-stranded DNA. 

The details of one or more embodiments of the invention are set forth in the accompa- 
nying drawings and the description below. Other features, objects, and advantages of the 

25 invention will be apparent from the description and drawings, and from the claims. 

DESCRIPTION OF DRAWINGS 

Fig. 1 is a diagram of the human lactoferrin gene locus and a representation of 
overlapping BAC clones. The shaded box represents the lactoferrin coding sequence and the 
hatched box represents dominant control regions. 
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Figs. 2A is a diagram of of a human lactoferrin PAC clones, and Fig. 2B is a diagram 
of human lactoferrin PAC subclones. B BamHI, R = EcoRI, Sp = SpHI, X = Xbal, Xh = 
Xhol. 

Fig. 3 is a diagram of the construction strategy for a human lactoferrin expression 
5 cassette. B = BamHI, R = EcoRI, N = NotI, S = Sal ISp = SpHI. X = Xbal, Xh = Xhol 

DETAILED DESCRIPTION 

Human lactoferrin genomic DNA was cloned, and a milk specific expression cassette 
constructed utilizing human lactoferrin promoter sequences and other lactoferrin-derived 
enhancer and regulatory elements. Lactoferrin is found in concentrations of at least 2 mg/ml 

10 in human breast milk which makes it a minor component of milk (Masson, P.L. and Heremans, 
J.F. Comp. Biochem. Physiol. 39B:119-129 (1971)). The lactoferrin promoter is a moderate 
strength promoter when compared to the casein promoters which direct high level expression 
of casein (10-20 mg/ml). In addition, the himian lactoferrin promoter is somewhat unique 
compared to lactoferrin promoters of other species which direct dramatically lower levels of 

15 lactoferrin in milk. The human lactoferrin promoter is an optimal promoter for directing 
expression of heterologous proteins in mammary gland tissue of transgenic animals. 

The human lactoferrin locus (Fig. 1) was isolated from commercially available 
himian bacterial artificial chromosome (BAG) human PI artificial chromosome (PAC) 
libraries. Due to the unique nature of the B AC and PAC clones, the entire locus was covered 

20 in 2-5 individual clones. Each clone is capable of holding 75-150 kb of genomic DNA unlike 
cosmid vectors which can only hold 30-40 kb. The clones fiom the different libraries were 
characterized by restriction analysis and southern blotting to ensure that overlapping clones 
were isolated (Fig. 1). These overlapping clones were used to construct a milk specific 
expression cassette and to isolate the dominant control region for the locus. 

25 The human lactoferrin gene along with 20-30 kb of surrounding flanking 

sequence was subcloned from one of the artificial chromosome vectors into a cosmid vector. 
The gene was engineered to delete the protein coding sequence and add unique cloning sites 
for the addition of heterologous protein coding sequences. The human lactoferrin promoter is 
used to direct expression of foreign proteins to the milk of transgenic non-human mammals. 

30 The promoter is attached to either genomic or cDNA protein coding sequences. The human 
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lactoferrin 3' flanking sequence or a 3' flanking sequence of any other gene is inserted into the 
expression cassette or vector to ensure stable mRNA expression and poly adenylation. For 
example, the 3' flanking sequence is derived from the 3' flanking region of actin, albumin, or 
butyrophilm. 

5 The transcription unit of the transgene expression system of the invention contains 

DN A sequences encoding a transgene, any expression control sequences such as a promoter or 
enhancer, a polyadenylation element, and any other regulatory elements that may be used to 
modulate or increase expression, all of which are operably linked in order to allow expression 
of the transgenic polypeptide. 

10 Preferably, the human lactoferrin promoter regulatory DNA is used to control 

expression of a transgene in a transcription unit, or a truncated fragment of this promoter 
which functions analogously may be used. The lactoferrin-derived regulatory sequence, e.g., 
promoter sequence or DCR is positioned 5' to a heterologous nucleic acid sequence, e.g., a 
transgene, in a transcription unit. Portions of the lactoferrin-derived promoter region are 

15 tested for their ability to allow tissue-specific and elevated expression of a transgene using 
assays known in the art, e.g., standard reporter gene assays using luciferase, beta- 
galactosidase, or expression of an antibiotic resistance gene as a detectable marker for 
transcription. AH or part of one of the nucleotide sequences specified in a reference sequence, 
e.g., SEQ ID N0:1 or 2, its complementary strand or a variant thereof may be used in to direct 

20 transcription of a heterologous nucleic acid sequence such as a transgene in a transgenic 

mammal. A nucleic acid fragment is a portion of at least 20 continuous nucleotides identical 
to a portion of length equivalent to one of the reference nucleotide sequences or to its 
complement. 

The invention includes sequences which hybridize under stringent conditions, with all 
25 or part of the sequence reported in a reference sequence and retains transcription regulatory 
function. For example, the nucleic acid may contain one or more sequence modifications in 
relation to a reference sequence. Such modifications may be obtained by mutation, deletion 
and/or addition of one or more nucleotides compared to the reference sequence. Modifications 
are introduced to alter the activity of the regulatory sequence, e.g., to improve promoter 
30 activity, to suppress a transcription inhibiting region, to make a constitutive promoter 

regulatable or vice versa. Modification are also made to introduce a restriction site facihtating 
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subsequent cloning steps, or to eliminate the sequences which are not essential to the 
transcriptional activity. Preferably, a modified sequence is at least 70% (more preferably at 
least 80%, more preferably at least 90%, more preferably at least 95%, more preferably at least 
99%) identical to a reference sequence. The modifications do not substantially alter the . 

5 transcription promoter function associated with the reference sequence (or a naturally- 
occurring lactoferrin promoter sequence). For example, modifications are engineered to avoid 
the site of initiation of translation. 

Nucleotide and amino acid comparisons are carried out using the Lasergene software 
package (DNASTAR, Inc., Madison, WI). The MegAlign module used was the Clustal V 

10 method (Higgins et al., 1989, CABIOS 5(2): 15 1-153). The parameter used were gap penalty 
10, gap length penalty 10. 

Altcmatively, the nucleic acids described herein hybridize at high stringency to a 
strand of DNA having the reference sequence, or the complement thereof and have 
transcription regulatory activity. Hybridization is earned out using standard techniques, such 

15 as those described in Ausubel et al. (Current Protocols in Molecular Biology, John Wiley & 
Sons, 1989). "High stringency" refers to nucleic acid hybridization and wash conditions 
characterized by high temperature and low salt concentration, i.e, hybridization at 42 degrees 
C, and in 50% formamide; a first wash at 65 degrees C, 2X SSC, and 1% SDS; followed by a 
second wash at 65 degrees C and 0.2% x SSC, 0.190 SDS. Lower stringency conditions 

20 suitable for detecting DNA sequences having about 50% sequence identity to a reference gene 
or sequence are detected by, for example, hybridization at 42 degrees C in the absence of 
formamide; a furst wash at 42 degrees C, m 6X SSC, and 1% SDS; and a second wash at 50 
degrees C, in 6X SSC, and 1% SDS. 

Techniques to evaluate whether a variant has a promoter activity or transcription 

25 regulatory activity are known in the ait. For example, the sequence to be tested is inserted 
upstream of a reporter gene whose expression is detectable (e.g., P-galactosidase, catechol 
oxygenase, luciferase or a gene conferring resistance to an antibiotic). The promoter activity 
or transcription regulatory activity is at least 50% (more preferably 60%, more preferably 
70%, more preferably 80%, more preferably 90%, more preferably 95%, more preferably 

30 99%, and most preferably 1 00%) of that associated with the reference sequence (or a 

naturally-occurring lactoferrin promoter or DCR). A sequence may also be modified so that 
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the promoter activity or transcription regulatory activity is greater than that associated with the 
reference sequence (or a naturally-occurring lactoferrin promoter or DCR). For example, an 
increase in prompter activity is at least twice that of naturally-occurring lactoferrin sequence. 
In another example, an increase in transcriptional acitivity is directly proportionate to the 

5 number of copies of a given regulatory sequence, e.g., a DCR. Thus, a transcription unit or 
expression cassette may contain two or more copies of a regulatory sequence such as a DCR in 
tandem to increase production of a desired gene product. 

The components of a transgene expression system are delivered to a cell on one or 
more vectors, which include, but not limited to, plasmids and viruses. One or more 

10 transcription units may be provided on a plasmid, where a lactoferrin-derived promoter region 
is used to control expression and is positioned 5* to a transgene 

Example 1 : Identification of Dominant Control Regions (DCR) 
In addition to the construction of a milk specific expression cassette, the isolated 
genomic clones are used to screen for a dominant control region (DCR) necessary for position 

15 independent, copy number dependent expression. To screen for a DCR, the two most distal 
clones which contain an intact lactoferrin coding sequence are isolated fix>m all bacterial 
sequences and microinjected into a mammalian transgenic model, e.g., mouse or rat embryos, 
to produce transgenic animals. Any clone containing a DCR will produce equivalent amounts 
of human lactoferrin in the milk of all transgenic lines tested. Once a DCR has been localized 

20 to a single B AC or PAC clone, the DCR can be fiirther localized by deletional analysis and the 
production of additional transgenic animals. Once the DCR has been localized to a S-10 kb 
region, this region can be connected to the lactoferrin promoter cassette to direct position 
independent, copy number dependent expression. Although this procedure is described for the 
human lactoferrin locus, the technique is applicable to any locus such as but not limited to 

25 casein or lactoglobulin loci. 

The lactoferrin-derived regulatory sequences described herein are useful to 
direct expression of a transgene in mammary gland tissue of a transgenic non-human mammal. 
The mammary gland is used as a bioreactor to produce commercially valuable proteins. The 
methods described herein are used to clone the human lactoferrin gene and surrounding 

30 dominant control elements of the lactoferrin gene as well as casein and whey protein loci to 
obtain consistent tissue-specific expression of heterologous proteins in mammary glandtissue. 
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Example 2: Isolation of Genomic Human Lactoferrin Clones 
A milk specific promoter construct containing lactoferrin-derived transcription 
regulatory sequences is used for the production of foreign proteins in the milk of transgenic 
non-human mammals. The human lactoferrin gene was cloned and regulatory sequences 
5 modified for use as a promoter. The strategy described below is usefiil for isolating a milk 
specific dominant control region fi-om any milk gene locus. 

Human BAG and PAC libraries were purchased from Genome Systems Inc., St Louis, 
MO and were pre-blotted on to filters for screening. The filters were probed with 
oligonucleotides complimentary to the first and last exons of the lactoferrin gene. Reference 
10 sequences were obtained through the GENBANK™ system. All clones isolated were 

characterized by restriction analysis and southern blotting to determine regions of overlap. 

Table 4: Oligonucleotides Used to Screen Human PAC Genomic Library 
3'mRNA primers: 

15 HLAC5 5'-GGAAGCCTGTGAATTCCTCAGGAA-3' (SEQ ID N0:6) 

HLAC6 5'-GCAGGGAATTGTAAGCAGATGGAT-'3 (SEQ ID N0:7) 

Promoter primers: 

HLAC12 5'-CCTTGAGGATCCAGGCTCCGAA-3' (SEQ ID N0:8) 
20 HLAC13 5'.GAAGATAGCAGTCTCACAGGTCAA-3'(SEQIDNO:9) 

Genomic clones containing the human lactoferrin gene were isolated using DOWN TO 
EARTH*^^ human PAC DNA pools purchased fi-om Genome Systems, Inc. (St. Louis, MO). 
The human PAC DNA are arrayed in 20 microliter dishes which can be screened using e 

25 consecutive rounds of PCR to identify individual clones of interest. The PAC library was 
constructed by ligating a partial Sau3A I digest of human DNA into the vector pAdlOSacBII. 
The pAdlOSacBII vector is a low PI phage derived artificial chromosome vector capable of 
replication inserts of average size of 120 kb in the appropriate bacterial host. The vector is 
deisgned with T7 and SP6 promoters to enable sequencing of isolated clones and for 

30 chromosome walking in order to isolate entire gene loci or gene families. 
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In order to isolate the human lactoferrin gene, oligonucleotides were designed which 
were complimentary to the promoter region (sequence derived from GENBANK™ Accession 
#852659) and the 3' end of the human lactoferrin mRNA (sequence derived from 
GENBANK"^ Accession # X53961) for use in a polymerase chain reaction (see Table 4), The 
PGR primers were tested utilizing human genomic DNA and found to generate PGR fragments 
of the predicted size. The primers HLAG5 and HLAG6 were then used to screen the human 
PAG DNA pools and two positive clones were identified. The two clones were localized to 
wells 94K13 and 169a20 and ordered from Genome Systems, Inc. The bacterial clones were 
grown under kanamycin selection an amplified using IPTG for large scale preparation 
' accordmg to the manufacturer's protocol. To ensure that the clones contained the entire 
human lactoferrin gene, the two clones were then screened by PGR using the HLAG12 and 
HLAG13 primers. Both clones were found to contain the full length human lactoferrin gene 
and were then used for restriction mapping and subcloning of the gene fragments for 
construction of a mammary gland specific expression cassette. 

Example 3: Gonstruction of a Mammary Gland Specific Expression Gassette 
To construct a mammary gland specific expression cassette, the promoter and 3' 
flanking regions of the human lactoferrin gene were subcloned and unique restriction enzyme 
sites added to allow for the addition of heterologous coding sequences and excision from the 
vector backbone. A schematic representation of the two human lactoferrin clones is shown in 
Fig. 2A (not drawn to scale). Each clone contained an insert of approximately 120 kb. The 
human lactoferrin gene is approximately 24.5Kb in length and is divided into 17 exons (Kim 
et al., Mol. Gells 8(6):663-8 (1998). As shown in Fig. 2B, the human lactofenrin gene was 
subcloned as five distinct fragments into the vectors pUG19 (New England BioLabs, Beverly. 
MA) or Scl, The cosmid Scl was derived from the vector Supercos (Stratagene, La JoUa, 
GA) and has a multiple cloning site (Sall-BamHI-XhoI-NotI) added between the two EcoRI 
sites. The subclones were then used to reassemble a mammary gland specific expression 
cassette of the human lactoferrin gene. 

The promoter region was reconstructed as a Sail to Xhol fragment using the subclones 
HL3 and HLIO (Fig. 3). A unique Xhol restriction site was added before the ATG initiation 
codon using polymerase chain reaction mutagenesis and the oligonucleotides HL14 and HL14 
(Table 5). The 500 bp PGR fragment amplified from the vector HL3 was subcloned into PvuII 
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digested pUC19 to form the vector HL12. The plasmid HL12 was then digested with BamHI 
and Xhol to excise the human lactoferrin fragment which was ligated into BamHI/XhoI 
digested Scl to torn the vector HL14. HL14 was digested with BamHI, treated with calf 
intestinal alkaline phosphatase, and the 3.2 kb fragment from HLIO inserted. The orientation 
of the 3,2 kb insert was detemiined by restriction analysis and confirmed by DNA sequencing. 
The final vector was designated HL15 and contains qjproximately 3 kb of promoter sequence 
which can be excised as a Sail to Xhol fragment. 

Table 5: Oligonucleotides Used to Add an Xhol site Upstream of the Initiation Codon 
HLAC14 5'.CCTTCAAGGTCGACTGCTGAAGAAGAT-3' (SEQ ID NO:10) 
HLACl 7 5^CATGTCTGCGGTCTCGAGGCGACTTGGCAA-'3 (SEQ ID 

N0:1) 

HLLINK3 5- 

CTAGATAAGCCGACTCCAGCAGTAACGTCGACGCGGCCGCA-3' (SEQ ID NO:12) 
HLLmK4 5'- 

AGCTTGCGGCCGCGTCGACGTTACTGCTGGAGTCGGCTTAT-3' (SEQ ID N0:13) 

The 3' flanking region of the gene was subcloned as single BamHI fragment of 
over 20 kb in length which was designated HLl 1 (Fig. 3 ). Restriction analysis of the vector 
HLl 1 revealed the presence of several Xhol sites which were removed before reconstruction 
of the 3* flanking region. To remove the Xhol sites, the 3' end was further subcloned by 
digestion with EcoRI or Xbal into the vector pUC19. Two overlapping clones designated 
HLl 6 and HL24 were found to contain the stop codon and immediate 3' region of the gene. 
In order to add a unique 3 ' restriction site, the plasmid HLl 6 was digested with Xbal which 
leaves the 5 'fragment attached to the vector backbone, gel purified, and ligated with a 
synthetic linker (Table 5, oligonucleotides HLLINK3 and HLLINK4). The correct orientation 
of the linker was determined by restriction analysis and the new plasmid designated HL26. 
The plasmid HL26 was then digested EcoRI and ligated with the synthetic linker: 

5'-AATTGCTCGAGC-3' (SEQ ID N0:14) 
5'-CGAGCTCGTTAA-3* (SEQ ID N0:15) 

- 14 
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The addition of the linker converts the EcoRI site to an Xhol site and forms the 
plasmid HL27. To complete the 3' flanking region, HL27 was digested with Xbal and Sail 
and ligated with the 7 kb Xbal/Xhol fragment from HL24. The final construct was designated 
HL28 and could be excised as an Xhol to NotI fragment approximately 7.2 kb in length. The 
5 XhoI/NotI fragment from HL28 was then ligated into XhoI/NotI digested HL15 to form the 
final vector HL29 (Fig, 3). 

A number of embodiments of the invention have been described. Nevertheless, it will 
be understood that various modifications may be made without departing from the spirit and 
scope of the invention. Other embodiments are within the scope of the following claims. 

10 

WHAT IS CLAIMED IS: 
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1 1 , An isolated nucleic acid, comprising a promoter region derived from the human 

2 lactoferrin gene operably linked to a heterologous sequence, wherein said promoter region 

3 comprises nucleotides 1-154 of the nucleotide sequence of SEQ ID N0:2. 

1 2. The nucleic acid of claim 1 , wherein said nucleic acid further comprises 

2 nucleotide 1-1 176 of the nucleotide sequence of SEQ ID N0:16. 

1 3. An isolated nucleic acid, comprising a promoter region derived from the himian 

2 lactoferrin gene operably linked to a heterologous sequence, wherein said promoter region 

3 comprises nucleotides 1 - 1 54 of the nucleotide sequence of SEQ ID NO: I . 

1 4. The nucleic acid of claim 2, wherein said nucleic acid fiirther comprises 

2 nucleotides 1 - 1 1 76 of the nucleotide sequence of SEQ ID NO: 16. 

1 5, The nucleic acid of claim 1, wherein said heterologous sequence encodes a 

2 polypeptide. 

1 6. The nucleic acid of claim 2, wherein said heterologous sequence does not 

2 encode a naturally occurring lactoferrin polypeptide. 

1 7 . The nucleic acid of claim 1 , wherein said nucleic acid further comprises an 

2 RNA stabilization sequence. 

1 8. The nucleic acid of claim 7, wherein said RNA stabilization sequence 

2 comprises nucleotides 424-1058 of the nucleotide sequence of SEQ ID N0:3. 

1 9. The nucleic acid of claim 7, wherein said RNA stabilization sequence 

2 comprises the nucleotide sequence of SEQ ID N0:4. 

1 1 0. The nucleic acid of claim 8, wherein said RNA stabilization sequence fiirther 

2 comprises the nucleotide sequence of SEQ ED N0:5. 

1 11. The nucleic acid of claim 9, wherein said RNA stabilization sequence fiirther 

2 comprises the nucleotide sequence of SEQ ID N0:5. 
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1 12. The nucleic acid of claim 1 , wherein said nucleic acid further comprising a 

2 poiyadenylation sequence. 

1 13. The nucleic acid of claim 1 , wherein said heterologous sequence encodes be 

2 insulin, calcitonin, serum albumin, a tetramric antibody, an FAb fragment, a single chain 

3 antibody, a plasma protein, an industrial enzyme, silk, or a membrane receptor. 

1 1 4. An isolated nucleic acid comprising a lactoferrin-derived promoter sequence 

2 and a dominant control region (DCR). 

1 15. The nucleic acid of claim 8, wherein said DCR regulates tissue-specific 

2 transcription of a heterologous nucleic acid sequence, wherein regulation of transcription by 

3 said DCR is position independent relative to the location of said heterologous nucleic acid 

4 sequence. 

1 1 6. The nucleic acid of claim 8, wherein said DCR regulates transcription of a 

2 heterologous nucleic acid sequence, wherein an increase in the level of transcription of said 

3 heterologous nucleic acid sequence is directly proportionate to the number of copies of said 

4 DCR. 

1 1 7. A transgenic non-human mammal comprising the isolated nucleic acid of claim 1 . 
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